Overview

Dataset statistics

Number of variables8
Number of observations1304
Missing cells309
Missing cells (%)3.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory81.6 KiB
Average record size in memory64.1 B

Variable types

NUM7
CAT1

Warnings

other_rooms has 309 (23.7%) missing values Missing
df_index has unique values Unique
bathrooms has 15 (1.2%) zeros Zeros
other_rooms has 977 (74.9%) zeros Zeros

Reproduction

Analysis started2021-06-05 21:03:49.897079
Analysis finished2021-06-05 21:03:59.866774
Duration9.97 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct1304
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15850.53758
Minimum286
Maximum21727
Zeros0
Zeros (%)0.0%
Memory size10.2 KiB
2021-06-05T23:04:00.007288image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum286
5-th percentile9117.8
Q113410.5
median16353.5
Q318949.25
95-th percentile20292.35
Maximum21727
Range21441
Interquartile range (IQR)5538.75

Descriptive statistics

Standard deviation3827.061894
Coefficient of variation (CV)0.24144682
Kurtosis2.443174668
Mean15850.53758
Median Absolute Deviation (MAD)2782
Skewness-1.244995471
Sum20669101
Variance14646402.74
MonotocityNot monotonic
2021-06-05T23:04:00.195569image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1228510.1%
 
1713310.1%
 
1914710.1%
 
2119910.1%
 
2120010.1%
 
1301010.1%
 
1301310.1%
 
1916010.1%
 
2120910.1%
 
2121110.1%
 
Other values (1294)129499.2%
 
ValueCountFrequency (%) 
28610.1%
 
28710.1%
 
29910.1%
 
41910.1%
 
57710.1%
 
ValueCountFrequency (%) 
2172710.1%
 
2161710.1%
 
2160810.1%
 
2156510.1%
 
2150610.1%
 

property_status
Categorical

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size10.2 KiB
Used
686 
New
467 
Not Applicable
150 
Under Construction
 
1
ValueCountFrequency (%) 
Used68652.6%
 
New46735.8%
 
Not Applicable15011.5%
 
Under Construction10.1%
 
2021-06-05T23:04:00.384490image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1 ?
Unique (%)0.1%
2021-06-05T23:04:00.520642image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:04:00.627698image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length18
Median length4
Mean length4.80291411
Min length3

price
Real number (ℝ≥0)

Distinct356
Distinct (%)27.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean66895.77147
Minimum280
Maximum340000
Zeros0
Zeros (%)0.0%
Memory size10.2 KiB
2021-06-05T23:04:00.808440image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum280
5-th percentile39075
Q152750
median65000
Q377000
95-th percentile101683.2
Maximum340000
Range339720
Interquartile range (IQR)24250

Descriptive statistics

Standard deviation23381.71764
Coefficient of variation (CV)0.3495245982
Kurtosis26.66177476
Mean66895.77147
Median Absolute Deviation (MAD)12000
Skewness3.273071013
Sum87232086
Variance546704719.9
MonotocityNot monotonic
2021-06-05T23:04:01.002298image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
65000483.7%
 
70000423.2%
 
75000423.2%
 
55000302.3%
 
60000282.1%
 
80000272.1%
 
58000262.0%
 
43000262.0%
 
45000262.0%
 
62000251.9%
 
Other values (346)98475.5%
 
ValueCountFrequency (%) 
28010.1%
 
2600010.1%
 
2650010.1%
 
2700010.1%
 
2800010.1%
 
ValueCountFrequency (%) 
34000010.1%
 
30000010.1%
 
25000010.1%
 
21000010.1%
 
19900010.1%
 

interior_area
Real number (ℝ≥0)

Distinct135
Distinct (%)10.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean98.48389571
Minimum40
Maximum427
Zeros0
Zeros (%)0.0%
Memory size10.2 KiB
2021-06-05T23:04:01.204336image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile63
Q176
median99
Q3112
95-th percentile137
Maximum427
Range387
Interquartile range (IQR)36

Descriptive statistics

Standard deviation31.57548499
Coefficient of variation (CV)0.3206157186
Kurtosis24.61358459
Mean98.48389571
Median Absolute Deviation (MAD)16
Skewness3.367160887
Sum128423
Variance997.0112523
MonotocityNot monotonic
2021-06-05T23:04:01.396439image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
100433.3%
 
112362.8%
 
108322.5%
 
104312.4%
 
98302.3%
 
105282.1%
 
106282.1%
 
90282.1%
 
70272.1%
 
109272.1%
 
Other values (125)99476.2%
 
ValueCountFrequency (%) 
4010.1%
 
4410.1%
 
4520.2%
 
4840.3%
 
5020.2%
 
ValueCountFrequency (%) 
42710.1%
 
38510.1%
 
37110.1%
 
34210.1%
 
30010.1%
 

gros_area
Real number (ℝ≥0)

Distinct144
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean101.5138037
Minimum44
Maximum427
Zeros0
Zeros (%)0.0%
Memory size10.2 KiB
2021-06-05T23:04:01.597236image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile64
Q177
median100
Q3112
95-th percentile145
Maximum427
Range383
Interquartile range (IQR)35

Descriptive statistics

Standard deviation37.36028721
Coefficient of variation (CV)0.3680315962
Kurtosis21.70282905
Mean101.5138037
Median Absolute Deviation (MAD)16
Skewness3.646046779
Sum132374
Variance1395.79106
MonotocityNot monotonic
2021-06-05T23:04:01.797177image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
100393.0%
 
112362.8%
 
108322.5%
 
98312.4%
 
104312.4%
 
105292.2%
 
106292.2%
 
109272.1%
 
103262.0%
 
90262.0%
 
Other values (134)99876.5%
 
ValueCountFrequency (%) 
4410.1%
 
4510.1%
 
4830.2%
 
5010.1%
 
5110.1%
 
ValueCountFrequency (%) 
42710.1%
 
39310.1%
 
39110.1%
 
38510.1%
 
37110.1%
 

bedrooms
Real number (ℝ≥0)

Distinct6
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.769171779
Minimum0
Maximum5
Zeros4
Zeros (%)0.3%
Memory size10.2 KiB
2021-06-05T23:04:01.975717image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q32
95-th percentile3
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.5964339574
Coefficient of variation (CV)0.337126086
Kurtosis0.783139381
Mean1.769171779
Median Absolute Deviation (MAD)0
Skewness0.239167673
Sum2307
Variance0.3557334655
MonotocityNot monotonic
2021-06-05T23:04:02.154283image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%) 
280661.8%
 
139730.4%
 
3917.0%
 
450.4%
 
040.3%
 
510.1%
 
ValueCountFrequency (%) 
040.3%
 
139730.4%
 
280661.8%
 
3917.0%
 
450.4%
 
ValueCountFrequency (%) 
510.1%
 
450.4%
 
3917.0%
 
280661.8%
 
139730.4%
 

bathrooms
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.435582822
Minimum0
Maximum21
Zeros15
Zeros (%)1.2%
Memory size10.2 KiB
2021-06-05T23:04:02.313732image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q32
95-th percentile2
Maximum21
Range21
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7569524049
Coefficient of variation (CV)0.5272788119
Kurtosis341.6391126
Mean1.435582822
Median Absolute Deviation (MAD)0
Skewness13.33526062
Sum1872
Variance0.5729769433
MonotocityNot monotonic
2021-06-05T23:04:02.456445image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%) 
173156.1%
 
255342.4%
 
0151.2%
 
420.2%
 
320.2%
 
2110.1%
 
ValueCountFrequency (%) 
0151.2%
 
173156.1%
 
255342.4%
 
320.2%
 
420.2%
 
ValueCountFrequency (%) 
2110.1%
 
420.2%
 
320.2%
 
255342.4%
 
173156.1%
 

other_rooms
Real number (ℝ≥0)

MISSING
ZEROS

Distinct5
Distinct (%)0.5%
Missing309
Missing (%)23.7%
Infinite0
Infinite (%)0.0%
Mean0.02713567839
Minimum0
Maximum4
Zeros977
Zeros (%)74.9%
Memory size10.2 KiB
2021-06-05T23:04:02.604151image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2293094714
Coefficient of variation (CV)8.450478669
Kurtosis142.8931281
Mean0.02713567839
Median Absolute Deviation (MAD)0
Skewness10.92900813
Sum27
Variance0.05258283369
MonotocityNot monotonic
2021-06-05T23:04:02.754466image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
097774.9%
 
1120.9%
 
240.3%
 
310.1%
 
410.1%
 
(Missing)30923.7%
 
ValueCountFrequency (%) 
097774.9%
 
1120.9%
 
240.3%
 
310.1%
 
410.1%
 
ValueCountFrequency (%) 
410.1%
 
310.1%
 
240.3%
 
1120.9%
 
097774.9%
 

Interactions

2021-06-05T23:03:51.984277image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:52.114000image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:52.239914image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:52.372923image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:52.507432image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:52.629740image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:52.750237image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:52.883250image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:53.011313image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:53.143929image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:53.281210image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:53.420022image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:53.551284image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:53.679001image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:53.819644image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:53.955875image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:54.099192image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:54.246907image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:54.392533image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:54.528830image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:54.663243image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:54.812156image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:54.947001image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:55.086211image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:55.231204image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:55.374488image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:55.514439image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:55.651253image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:55.800308image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:55.927500image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:56.057247image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:56.193253image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:56.329591image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:56.459153image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:56.583391image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:56.721394image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:56.847666image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:57.007660image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:57.140339image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:57.273009image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:57.398415image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:57.520681image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:57.657915image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:57.804109image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:57.952188image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:58.104112image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:58.680341image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:58.822163image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:58.959209image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2021-06-05T23:04:02.913413image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-06-05T23:04:03.075874image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-06-05T23:04:03.234109image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-06-05T23:04:03.395513image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-06-05T23:03:59.237962image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:59.553594image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-06-05T23:03:59.712929image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

df_indexproperty_statuspriceinterior_areagros_areabedroomsbathroomsother_rooms
020102New66000.0110.0110.022.00.0
119641Used65000.0119.0119.022.00.0
220146New39000.069.069.011.00.0
320100New63000.0103.0103.022.00.0
419650New41000.075.075.011.00.0
519686New46000.070.070.011.00.0
620105New52000.098.098.022.00.0
713781Used55000.061.061.011.0NaN
811496Not Applicable100000.0114.0114.022.0NaN
917142Used59000.075.0120.011.00.0

Last rows

df_indexproperty_statuspriceinterior_areagros_areabedroomsbathroomsother_rooms
1294826Used49000.064.064.011.00.0
129519365New66960.0112.0112.022.00.0
129619204Used50000.089.089.011.00.0
129719366New64350.0117.0117.022.00.0
129817373Used66960.0108.0108.022.00.0
129911572Not Applicable72000.090.090.022.0NaN
130017374Used45500.073.073.011.00.0
130118158Used46600.078.078.021.00.0
130219367New39600.066.066.011.00.0
130319368Used63000.0103.0103.021.00.0